A Word Embeddings based Approach for Author Profiling: Gender and Age Prediction

نویسندگان

چکیده

Author Profiling (AP) is a method of identifying the demographic profiles such as age, gender, location, native language and personality traits an author by processing their written texts. The AP techniques are used in multiple applications literary research, marketing, forensics security. researchers identified various differences authors writing styles analysing datasets. represented stylistic features. extracted several style based features like structural, content, word, character, syntactic, readability semantic to recognize authors. Traditionally, feature combinations for differentiating Several existing works Machine Learning (ML) methods predicting characteristics new author. achieved good accuracies considering both ML algorithms combination. Recently, advent Deep (DL) proposed approaches profiling using these techniques. Few that deep learning performance prediction than results In this work, word embeddings approach gender age prediction. approach, experiment conducted with different embedding models Word2Vec, GloVe, FastText BERT generating vectors words. documents converted document representation technique which uses transferred three Extreme Gradient Boosting (XGBoost), Random Forest (RF) Logistic Regression (LR) trained model. This model predicating accuracy XGBoost classifier other algorithms. implemented on PAN 2014 competition Reviews dataset attained best performances AP.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Twitter Author Profiling Using Word Embeddings and Logistic Regression

The general goal of the author profiling task is to determine various social and demographic aspects of the author based on his pieces of writing. In this work, we propose an approach that combines word embeddings and classical logistic regression for identifying author gender and language variety based on the corresponding tweets. The model was trained on PAN 2017 Twitter Corpus that contains ...

متن کامل

Author Profiling: Age Prediction Based on Advanced Bayesian Networks

In this study, we present a new method for profiling the author of an anonymous English text. The aim of author profiling is to determine demographic (age, gender, region, education level) and psychological (personality, mental health) properties of the authors of a text, especially authors of user generated content in social media. To obtain the best classification, authors resort to machine l...

متن کامل

PAN 2017: Author Profiling - Gender and Language Variety Prediction

We present the results of gender and language variety identification performed on the tweet corpus prepared for the PAN 2017 Author profiling shared task. Our approach consists of tweet preprocessing, feature construction, feature weighting and classification model construction. We propose a Logistic regression classifier, where the main features are different types of character and word n-gram...

متن کامل

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., “X and Y”) from a large corpus of plain text, and generate vectors where each coordinate represents the cooccurrence in SPs of the represented word with another word of the vocabulary. Our representation has three advantages over existing alternatives: First, b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Recent and Innovation Trends in Computing and Communication

سال: 2023

ISSN: ['2321-8169']

DOI: https://doi.org/10.17762/ijritcc.v11i7s.6996